Mixed Model
Usefull librarys.
library(rmarkdown) # You need this library to run this template.
library(epuRate) # Install with devtools: install_github("holtzy/epuRate", force=TRUE)
library(DT) # pour faire de joies affichage de dataframe
# library(readtext)
# library(tcltk2) # choose directory
library(readxl)
library(afex) # nouveau package, surveiller les MAJ
library(lme4)
library(RVAideMemoire)
library(LMERConvenienceFunctions)
library(ggplot2)
library(phia)
library(multcomp)
library(lmerTest)
library(emmeans)
library(ggpubr)
library(ggeffects)
library(phia)
library(splines)
library(DHARMa) # diagnostic lmer
library(tidyverse)There are situations under which the analysis of variance and maximum likelihood, the major tool for mixed models, produce the same result. It is therefore tempting to start with Anova and then move on.
Part 1 - Why Mixed Effects Modeling?
What is ANOVA? When can I use it?
For experimentalists, the go-to statistical tool is the ANOVA. The way we learn ANOVA often feels like cooking or DIY: take means, sum squares in a specific way, split them across cells of a table, introduce some degrees-of-freedom, and if you have followed the recipe closely, get the magical p-values.
Typical ANOVA table
With time, we forget that the ANalysis Of VAriance is first and foremost based on a linear model:
\[Y = \beta_0 + \beta_1 X + \epsilon\]
where \(Y\) is a dependent variable, \(\beta_1\) is the slope (aka coefficient) for the independent variable (aka regressor or factor) \(X\), \(\beta_0\) is the intercept (i.e. the value \(Y\) when \(X=0\)). The equation simply formalizes how \(Y\) changes as a function of \(X\), with \(X\) representing experimental conditions, groups of subjects, covariables, etc.
Classical ANOVA uses a least squares method to solve the equation and find approximate values of the \(\beta\) ; it uses \(F\)-tests for significance. Because of that, it requires a number of assumptions that are rarely all fulfilled:
Balanced design
= same number of observations in each level of a factor. For example, same number of subjects in all groups of a between-group design. This is rarely exactly true and sometimes far from it (e.g. a group of hard-to-find patients vs. a large group of controls)No missing datas
If data is missing in one of the cells (e.g. corrupted data for one condition in one subject), degrees-of-freedom are no longer valid. Often the data of the corrupted subject needs to be dropped entirely.Normal distribution
This assumption is required by the \(F\)-test. It is possible to circumvent it using generalized linear models, which can handle various common distributions, bit it gets us far from the classical ANOVA.Homoskedasticity
This assumptions is also required by \(F\)-tests. Homoscedasticity is the equality of variance across values of the regressors (e.g. groups, levels of a factor, etc.). Homoscedasticity describes a situation in which the error term (that is, the “noise” or random disturbance in the relationship between the independent variables and the dependent variable) are drawn from the same normal distribution across all values of the independent variables.
The violation of homoskedasticity often has a dramatic effect on Type I error rate while decreasing power.
- Sphericity
This is the equivalent of homoskedasticity for repeated measures Anova (for an intuition of this, see https://stats.stackexchange.com/a/213433). Its violation has the same consequences. Sphericity states that variances of each of the pairwise contrasts should be the same. Example: consider the data below, obtained from a within-subject design.
We can see that the difference between treatments B and C varies across patients, but it goes in the same direction ; similarly for the difference between treatments A and C. However, the difference between treatments A and B varies more across subjects, it even goes in opposite directions. Actually, its variance is higher than the two other contrasts.
The sphericity assumption is rarely met in real data, hence the development of corrections procedures (Greenhouse-Geisser, Huynd-Feldt…).
ANOVA vs. linear regression
Most people either use ANOVA for categorical predictor variables, or regression for continuous predictor variables. But we have seen that ANOVA is based on a linear regression, so they are actually the same thing!
When you use Anova, you implicitly describe your data with linear model.
Let’s check this with real data:
Anova:
load(file = file.path(getwd(), 'Datas/Raw_Datas.RData'))
anova.RT <- aov(RT ~ Time, data = Raw.Datas)
anova.RT$coefficients## (Intercept) Time
## 550.50148 11.65814
summary(anova.RT)## Df Sum Sq Mean Sq F value Pr(>F)
## Time 1 101210 101210 4.435 0.0377 *
## Residuals 99 2259278 22821
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Linear Model:
lm.RT <- lm(RT ~ Time, data = Raw.Datas)
lm.RT$coefficients## (Intercept) Time
## 550.50148 11.65814
summary(lm.RT)##
## Call:
## lm(formula = RT ~ Time, data = Raw.Datas)
##
## Residuals:
## Min 1Q Median 3Q Max
## -286.969 -110.050 -9.069 118.755 295.075
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 550.501 30.110 18.283 <2e-16 ***
## Time 11.658 5.536 2.106 0.0377 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 151.1 on 99 degrees of freedom
## Multiple R-squared: 0.04288, Adjusted R-squared: 0.03321
## F-statistic: 4.435 on 1 and 99 DF, p-value: 0.03774
Coefficients, degrees-of-freedom, F-statistics and p-values are all exactly the same!
Beyond the simple linear model
Let’s imagine that you’re trying to estimate RT along the Time. A simple linear model could be used to estimate this relationship: \[RT = \beta_0 + \beta_1 * Time + \epsilon\]
## `geom_smooth()` using formula 'y ~ x'
coef.lm <- round(coef(lm(RT ~ Time, data = Raw.Datas))[["Time"]])We find a RT increase of 12ms/Day .
But, there are different Subjects. And if we color in terms of Subject :
## `geom_smooth()` using formula 'y ~ x'
It is clear that there are great variations in RT across Subjects. Kevin for example is faster than Sam.
It may be the case that each Subject has a different starting RT, while the day rate at which RT increase is consistent across Subjects. If we believe this to be the case, we would want to allow the intercept to vary by Subject:
\[RT = \beta_{0,i} + \beta_1 * Time + \epsilon\]
where \(i\) indexes the Subjects.
This strategy allows us to capture variation in initial RT of our subjects. The slope over time is still estimated globally, i.e. it is assumed to be the same for all subjects. Because some subjects increase while others decrease, it appears flat on the whole.
Let’s try to incorporate additional information into our model. It’s reasonable to imagine that the most realistic situation is a different start RT and increase (or decrease) at different rates depending on the Subject:
\[RT = \beta_{0,i} + \beta_{1,i} * Time + \epsilon\]
where \(i\) indexes the Subjects.
## `geom_smooth()` using formula 'y ~ x'
## (Intercept) Time
## Bob 488.5440 3.7491724
## Jack 606.5349 -0.6137286
## Jr 723.3436 -4.5767748
## Kevin 798.9889 2.1135422
## Sam 389.7864 -0.3285136
Can I use ANOVA ?
OK
No missing datas ?KO
Normally distribution ?KO
Homoskedasticity ?KO
sphericity ?KO
Part 2 - Mixed Models
The spirit
In Part 1 we have seen that to take into account variability across subjects and non-independence of data within each subject, we need to model intercepts and slopes for each subject.
Linear mixed models (LMMs) do exactly that. In addition, LMMs take into account that our subjects are a random sample drawn from a more general population. Therefore, LMMs model subjects’ intercepts and slopes as drawn from a normal distribution centered on the (estimated) population average:
\[\beta \sim \mathcal{N}(\beta_{mean},g)\] We can rewrite this by splitting it into a fixed term (a constant) and a random term:
\[\beta = \beta_{mean} + u\] \[u \sim \mathcal{N}(0,g)\]
LMMs are said to be mixed because they contain both fixed and random terms (aka effects).
Specifying the model
Fixed and random effects
The first thing to do when we define a LMM is to decide what are the fixed and random effects:
fixed: any variable whose impact on the dependent variable you care about. This includes experimental conditions, groups, covariates such as gender, age, performance, etc. + interactions of interest. In general, levels of fixed effects are all known. For instance, Gender has male and female and that’s it.
random: any variable that may introduce non-independence in the data, but whose levels are drawn randomly from a wider population. In our field it is tyically Subject (or Animal).
In education research, if a test is carried out in multiple classes or schools, then Class and School are also random effects (in addition to Student).
In emotions or linguistic research, it is common to have a number of stimuli (e.g. 10 happy faces vs. 10 sad faces, or 20 words vs. 20 non-words) that do not exhaust the entire set of possible stimuli. Therefore, Stimulus (or Item) should also be included as a random factor.
Note that in some cases, an effect can be entered both as a fixed and a random effect. In education research for example, we might be interested in both generalizing over all schools (random effect) and in the differences between the schools that participated to the study (fixed effect).
Formula
Go to this page for a complete description of linear model syntax in R.
Fixed Effects
Formula for fixed effects
Random Effects
Formula for random effects
Intercept
Let’s go back to our sample data. RT are continuous and non-negative, so we will use the Gamma family.
The variation between Subject is big, we account for them in our model!
The levels of subject variations are infinite, we have here only 5 levels, so we must take into account to generalize our analysis.
Including (1|Subject) in the model means “assume an intercept that’s different for each subject”. 1 stands for the intercept here.
Like this, you specifically say that there are several RT per subject (repeated measurement). And these responses will depend on each subject’s baseline level. This effectively resolves the non-independence that stems from having multiple responses by the same subject.
model.01 <- lmer(RT ~ Time + (1|Subject), data = Raw.Datas)
summary(model.01)## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: RT ~ Time + (1 | Subject)
## Data: Raw.Datas
##
## REML criterion at convergence: 1049
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -2.1790 -0.5480 0.1330 0.6741 3.3911
##
## Random effects:
## Groups Name Variance Std.Dev.
## Subject (Intercept) 27289 165.19
## Residual 1653 40.65
## Number of obs: 101, groups: Subject, 5
##
## Fixed effects:
## Estimate Std. Error df t value Pr(>|t|)
## (Intercept) 600.1988 74.3411 4.0757 8.074 0.00118 **
## Time 0.4318 1.5411 95.0387 0.280 0.77993
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr)
## Time -0.097
And we can ge the values of intercept by Subject:
ranef(model.01)## $Subject
## (Intercept)
## Bob -97.611277
## Jack 1.014798
## Jr 101.387299
## Kevin 208.558741
## Sam -213.349560
##
## with conditional variances for "Subject"
We’re not done yet. One of the coolest things about mixed models is coming up now, so hang on!
Slopes
Let’s have a look at the coefficients of the model by subject.
The coef() function in lme4 returns the posterior modes of the \(\beta_i\). That is for a given \(i\), knowledge of the overall distribution of \(\beta_i\).
coef(model.01)$Subject["Time"]## Time
## Bob 0.4318244
## Jack 0.4318244
## Jr 0.4318244
## Kevin 0.4318244
## Sam 0.4318244
You see that each subject is assigned a different intercept That’s what we would expect, given that we’ve told the model with (1|subject) to take by-subject variability into account.
In this model, we account for baseline-differences in RT, but we assume that whatever the effect of Time is, it’s going to be the same for all subjects (i.e. coef(model.01)$Subject[1,'Time']).
We saw previously there is a subject-specific Time effect.
Including (Time|Subject) in the model means “assume an intercept that’s different for each subject” AND “assume a slope (Time effect) different for each subject”.
model.02 <- lmer(RT ~ Time + (Time|Subject), data = Raw.Datas)
summary(model.02)## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: RT ~ Time + (Time | Subject)
## Data: Raw.Datas
##
## REML criterion at convergence: 1046.5
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -2.3348 -0.5579 0.1290 0.6453 3.4166
##
## Random effects:
## Groups Name Variance Std.Dev. Corr
## Subject (Intercept) 28947.04 170.138
## Time 18.11 4.256 -0.35
## Residual 1550.03 39.370
## Number of obs: 101, groups: Subject, 5
##
## Fixed effects:
## Estimate Std. Error df t value Pr(>|t|)
## (Intercept) 601.70674 76.53055 4.01136 7.862 0.0014 **
## Time 0.06553 2.43855 3.96166 0.027 0.9799
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr)
## Time -0.327
coef(model.02)$Subject["Time"]## Time
## Bob 4.0940869
## Jack -0.6358560
## Jr -5.0087610
## Kevin 1.5069956
## Sam 0.3711984
## $Subject
Here is the conditional estimated mean and standard deviation (95% prediction interval) for each random effect.
As we can see, effect Time variation are smaller than Subject variation. All values overlap so we can ask the validity of random slope.
But is that a valid assumption?
Now you have two models to compare with each other. One with the effect in question, one without the effect in question. We perform the likelihood ratio test using the anova() function:
anova(model.01, model.02)## Data: Raw.Datas
## Models:
## model.01: RT ~ Time + (1 | Subject)
## model.02: RT ~ Time + (Time | Subject)
## npar AIC BIC logLik deviance Chisq Df Pr(>Chisq)
## model.01 4 1070.0 1080.5 -530.99 1062.0
## model.02 6 1072.3 1088.0 -530.13 1060.3 1.7244 2 0.4222
You find a Chi-Square value, the associated degrees of freedom and the p-value.
Contrary to what we saw in the plots, the random-effects term for Time is significant even in the presence of Subject term.
The p-value tests the hypothesis H0 = no difference between the ability the two models to explain data. Here, p <0.05, there is a difference and we can deduce that model.02 is better than model.01 because of its complexity and residual improvement (0.07629 -> 0.07422). AIC, BIC and LogLikelihood decrease say “no over fitting”.
So we keep model.02.
Which random slopes should I specify ?
You can almost always expect that people differ with how they react to an experimental manipulation. And likewise, you can almost always expect that the effect of an experimental manipulation is not going to be the same for all items. Therefore, not including the corresponding slopes would amount to ignore existing non-independence, something that is known to be a major issue in linear models (see part 1 for an example of incorrect inference drawn from such negligence). In particular, mixed models without random slopes have been shown through simulations in ecology (Schielzeth & Forstmeier, 2009) and psycholinguistics (Barr et al, 2013) to be anti-conservative, i.e. they have a relatively high Type I error rate.
Unfortunately there are situations where including all the relevant random slopes will lead to model failure. See Debugging to learn how to identify and address this issue.
Diagnostics
“Essentially, all models are wrong, but some are useful”
Most of the time a decent model is better than none at all. So take your model, try to improve on it, and then decide whether the accuracy is good enough to be useful for your purposes.
Plots empirical quantiles of a variable against theoretical quantiles of a comparison distribution (only for normal distributions):
qqPlot(residuals(model.02))## [1] 101 45
Below, an example of residuals nicely random across fitted values:
plot(model.02)If you can detect a clear pattern, shape or trend, or heterogeneity along the X-axis, then your model has room for improvement. Examples:
You may improve on your model by: - including additional predictors or higher polynomial orders to correct non-linearities - transforingm some predictors (log, square root…) - assessing the impact of an outlier - using generalized LMMs and choose an adequate family distribution
Diagnostics of GLMMs are a bit more difficult. The DHARMa library is dedicated to that: https://cran.r-project.org/web/packages/DHARMa/vignettes/DHARMa.html
Significance
If you want to publish, you will most likely need to report some kind of \(p\)-value.
lmer() provides estimates, standard errors, residuals and \(t\)-values, but no \(p\)-values because there is no universally valid method to calculate the required degrees-of-freedom for LMMs. The correct anwer depends on both the structure of data (balanced or not, full factorial or not, etc.) and the structure of the random effects. In many situations, there is not even a clear answer. For all these reasons, the main author of lme4 has decided not to provide any \(p\)-values.
However, other package authors have provided various methods. So-called conditional F-tests with approximated degrees of freedom are universally recommended. They are simple and have the best control of Type I error rate (i.e. false positives, see Luke 2017). Unfortunately they do not apply to generalized models, for which we need to fall back on other methods (see Significance/GLMMs).
There are 2 possible approximations of df: Satterthwaite and Kenward-Roger. Kenward-Roger is the most reliable but is exponentially time and RAM consuming with increasing number of data points and model complexity. Satterthwaite is extremely fast and is virtually as good as KR, except in extreme situations of scarce data (such as: a handful of participants, two 3-level factors and only one observation per cell, see Schaalje et al. 2002).
You will notice that the degrees of freedom are fractional: that is due to the fact that whole-plot and subplot variations are combined when standard errors are estimated.
Here’s the code in R:
library(lmerTest)
options(contrasts = c("contr.sum","contr.poly")) # absolutely necessary for interpretable type III ANOVA
anova(model, ddf = "Satterthwaite", type="3")or (same results but richer output):
summary(model)Note that the anova() and summary() functions above are not base R, they have been overloaded by lmerTest for linear mixed models.
Part 3 - Generalized mixed models
Family
lmer is used to fit linear mixed-effect models, so it assumes that the residual error has a Gaussian distribution. But if your dependent variable is a binary outcome, then the error distribution is binomial and not Gaussian. In this case you have to use glmer.
LMMs are no more robust to such violations than any other linear model, but they can be easily extended to generalized linear mixed models (GLMMs) where a non-normal distribution can be specified.
You can test the distribution with several tools provide by R but all are strongly limited. Our advise is to plot the density and folow the definition. Examples of classic situations where GLMMs are needed:
- reaction times: they usually follow a Gamma or Inverse Gaussian distribution. More generally, whenever your outcome is continuous and non-negative, then you should consider a Gamma distribution.
The most common alternative is to transform the data to render it more normally distributed. Indeed, log or Box-Cox transforms work very well on reaction times. However, the transformation approach makes interpretation in terms of mental chronometry more difficult: what does it mean that 2 conditions differ by 3.5 units of log(RT)? See also Lo & Andrews 2015.
- accuracy obviously follows a binomial distribution. More generally, whenever the dependent variable takes on 2 values, you may consider the binomial distribution. The use of the binomial distribution can also be adapted to proportions, see this post by one of the authors of
lme4.
## Warning: `guides(<scale> = FALSE)` is deprecated. Please use `guides(<scale> =
## "none")` instead.
- counts of an event in a given time interval (e.g. a specific behavior in an animal), or of an object in a given area (e.g. a specific cell in a sample of tissue) commonly follow a Poisson distribution.
- some data, such as responses to questionnaire items, are ordinal. GLMMs can be fitted on ordinal data using the package
ordinal. The approach for this kind of model is very particular and beyond the scope of this presentation.
## Warning: `guides(<scale> = FALSE)` is deprecated. Please use `guides(<scale> =
## "none")` instead.
For instance, in the previous data:
It’s easy to define a glmer, exactly the same as lmer with family argument.
model.03 <- glmer(RT ~ Time + (1|Subject),
data = Raw.Datas,
family = Gamma(link = "identity"))
summary(model.03) The link function provides the relationship between the linear predictor and the mean of the distribution function. It could be (options depend on the family):
- Identity
- Inverse
- log
- logit, probit
To better understand the link function and what’s the difference between a log link and log transforming your data ? Link
Residual interpretation for generalized linear mixed models (GLMMs) is often problematic. DHARMa package aims at solving these problems by creating readily interpretable residuals for generalized linear (mixed) models that are standardized. This is achieved by a simulation-based approach.
DHARMa Residuals plot
Significance
For GLMMs, approximations of df are possible in theory but not implemented in \(R\). Thus, we need to fall back on alternatives (the following are also valid for simple LMMs but are not as good as the approximations of df):
- the \(t\)-as-\(z\) aka Wald chi-square method: the logic is that, if we have many observations, the df will be quite large and the distribution of our \(t\)-values will approach the \(z\)-distribution (= normal distribution). And because the squares of a normally distributed variable follow a chi-square distribution with 1 df (noted \(\chi^2(1)\)), we can do a Wald test. This method is virtually instantaneous but the most anti-conservative: it can increase the false positive rate quite a lot, especially with small datasets (<50 subjects). Code:
library(car)
Anova(model, test.statistic = "Chisq") # "Chisq" is the default value- the likelikood ratio test (LRT) approach : the logic is to take the full model, drop one factor, and compare the two models to estimate the significance of that dropped model. It is slightly less conservative than the Wald \(\chi^2\). It has been implemented in different base \(R\) and
lme4functions. However, the implementation inafexis the easiest to use:
library(afex)
mixed(formula, data, method="LRT")Parametric bootstrap is a variant of LRT that offers a better control of Type I error rate, but is more time consuming.
mixed(formula, data, method = "PB")Part 4 - Debugging models
Model fitting
The tricky part in any statistical method is model fitting. Fitting a model means estimating its parameters/coefficients given the data.
Linear least squares (the procedure used for classic linear regressions) provides a closed-form solution to the fitting problem. This means that we can write down mathematical expressions for the model parameters. The only difficulty is to evaluate (= calculate) those expressions.
Ordinary Least Squares (OLS)
Linear mixed models do not have closed-form solutions. Fitting LMMs is still an active area of research. Many approaches have been proposed and implemented in various R packages, lme4 is only one of them (see Ben Bolker’s table of implemented methods). They all draw on probability theory and numerical algebra, which means it has to approximate intractable integrals and do crazy stuff on big matrices. With the complexity of the task come many opportunities for things to go a bit weird or even plain wrong.
Preventive measures
While there are many ways to address the warnings frequently issued by lme4, even professional statisticians can not (yet) provide a clear, systematic strategy. However, three things are recommended to reduce the risk of warnings:
collect enough data to support the complexity of your model
scaling: if predictors differ widely in their range (e.g. mouse weight in kilograms and age in months), scale them using the
scale()function so that they all have mean of 0 and standard deviation of 1use the BOBYQA optimizer: it is the default for LMMs but not for GLMMs.
glmer(..., control = glmerControl(optimizer="bobyqa"))Convergence issues
All procedures to fit LMMs are iterative. This means that there is an algorithm, called the optimizer, that repeats the same operations over and over until it is “satisfied”, i.e. it has converged.
Optimizers can be conceived as hikers who evolve in a landscape, looking for the lowest point. Their field of vision is very limited, they can only see one step ahead around them. At each iteration, they take the step that allow them to go down. They stop when they don’t know where to go next.
An optimizer walking down the parameter space
When an optimizer stops, it checks whether it has reached destination (= converged) by evaluating the local slope, or gradient. It is satisfied only if the gradient is close to 0 (think of it as a flat lake or sea level). Practically, it checks that it is smaller than a certain tolerance. If it is not, it will throw out a warning like this:
## Warning: Model failed to converge with max|grad| = 0.7741 (tol = 0.001)
This warning is extremely common. The authors of lme4 acknowledge that the tolerance they chose for the gradient is too strict. So there is a high chance of this warning being a false alarm. The following fixes can be tried sequentially until one works:
- if
max|grad| < 0.01, you can safely ignore it. If you’re a risk-taker, you can even ignore it up to 0.1 or 1. Those numbers are arbitrary anyway. - if applicable: scale predictors, use BOBYQA optimizer (see Debugging - preventive measures)
- try multiple optimizers. If they all agree and provide the same results, you’re safe. Potentially a bit long if the model is complex. Code:
modelfit.all <- allFit(model)
summary(modelfit.all)- restart the fit from the last value (= force the optimizer to resume walking)
if (isLMM(fm1)) {
pars <- getME(fm1,"theta")
} else {
# GLMM: requires both random and fixed parameters
pars <- getME(fm1, c("theta","fixef"))
}
m.restart <- update(m, start = pars)- restart the fit from a slightly perturbed value (= nudge the optimizer to help it out of the crack where it was stuck)
# random sampling of new values within a 1% margin
pars2 <- runif(length(pars),pars/1.01,pars*1.01)
m.restart2 <- update(m, start = pars2)Sometimes the optimizer is in such a difficult position that it can not even evaluate the gradient. When that happens, it will throw out one (or several) of these messages:
## Warning: Model failed to converge: degenerate Hessian with 1 negative
## eigenvalues
## Warning: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, ... :
## unable to evaluate scaled gradient
## Warning: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, ... :
## Hessian is numerically singular: parameters are not uniquely determined
There is nothing specific you can do to address this issue (but you can try any of the things described above and below).
Remember that convergence warnings do not mean something is wrong, only that something might be wrong. Always double-check your data and model, and have a look at the estimates with summary() to see if anything looks weird.
For more explanations on optimizers convergence issue, see:
Boundary (singular) fit
Linear least squares methods throw out a result no matter what. It does not care whether the model is appropriate to the data. In contrast, LMMs can be sensitive to whether the model makes sense at all given the data, and in particular to how complex the model is compared to the data. If data is insufficiently informative to properly fit the model, estimates may behave badly. Most often this will affect the variance-covariance matrix of random effects, which might converge to extreme values (0 for variance, 1 or -1 for covariances). This is called by lme4 a boundary (or singular) fit.
Note: when there are more than 2 variance-covariance parameters, boundary (singular) fits do not necessarily manifest as extreme values in the random effects’ parameters.
## Warning: boundary (singular) fit: see ?isSingular
This warning may mean one of two things: : there is no intercept or slope variance in the data. For example, maybe the treatment effect is very homogeneous across your sample of subjects. In which case, you can drop the corresponding parameter from your model. - there might be variance in your data, but not enough data to describe it accurately. This interpretation is more likely. It points to the fact that your design is somehow underpowered and that you are being too ambitious, something an ANOVA would never tell you.
According to this inspirational card, LMM is a better friend than ANOVA.
You can address the issue in 3 different ways:
- collect more data
- simplify the model
- go Bayesian: this is beyond this introduction, sorry!
For more explanations on boundary (singular) fit, see:
?isSingularinlme4manual- Ben Bolker comments on his GLMM FAQ
Finding the right structure of random effects
Sound theoretical considerations and expert opinion tell us that we should include all within-subject experimental manipulations and their interactions as slopes in the by-subject random effects.
Experience with real life data shows that such models often fail to fit properly with our limited datasets. Fortunately, simulations indicate that in such situation a model can be simplified without increasing the Type I error rate (Matuschek et al. 2017).
Bates et al. 2015 discuss extensively how to estimate model complexity and find the sweet spot. The logic is to proceed iteratively, following one of two strategies:
- starting from a basic model (e.g., just a random intercept) and enriching up to the point where it becomes too complex
- starting from a complex model and pruning until you lose fitting
At each step, you can make a decision based on two diagnostic tools:
likelihood ratio test (LRT) using
anova(model1,model2), wheremodel2is a model that has one more random parameter thanmodel1summary(rePCA(model))This function does a principal component analysis of the random effects variance-covariance matrix, along with the variance explained. If, say, 100% of the variance is explained with only 5 components out of 8, it suggests that you can safely remove 3 parameters. You just need to find which. Start with the most complex!
Part 5 - Post-Hoc
emmeans offers great functionality for both plotting and contrasts. Compute estimated marginal means (“least-squares means” or “predicted marginal means”), standard errors, confidence intervals and df estimated for specified factors or factor combinations in a (generalized) (mixed) linear model.
Huge list of model classes are supported by emmeans (to see more).
* anova
* linear model
* linear mixed models
* Generalized models
* bayesian models
* Manova
* GAM
* …
For LMM, the possible values are “kenward-roger”, “satterthwaite”. The computation time required depends roughly on the number of observations (inverting N x N matrix). Also, to avoid a message get_emm_option("pbkrtest.limit") You should opt for the Satterthwaite method.
For GLMM, tests and confidence intervals are asymptotic. This just sets all the degrees of freedom to Inf. That’s emmeans’s way of using z statistics rather than t statistics. The asymptotic methods tend to make confidence intervals a bit too narrow and P values a bit too low; but they involve much, much less computation.
Categorical Variable
Main Effects Only
See emmeans vignette: Confidence intervals and tests
model.03.fact <- lmer(RT ~ as.factor(as.character(Time)) + (1|Subject), data = Raw.Datas)
emm.03.fact <- emmeans(model.03.fact, ~ Time, lmer.df = "satterthwaite")
emm.03.fact## Time emmean SE df lower.CL upper.CL
## 0 587 76.1 4.35 382 792
## 1 603 76.4 4.42 398 807
## 2 615 75.4 4.19 410 821
## 3 591 75.4 4.19 385 797
## 4 602 75.3 4.16 396 808
## 5 608 75.7 4.26 402 813
## 6 616 75.8 4.28 411 822
## 7 595 75.5 4.22 390 801
## 8 601 75.6 4.23 395 806
## 9 602 75.6 4.24 397 807
##
## Degrees-of-freedom method: satterthwaite
## Confidence level used: 0.95
Emmergence test (i.e. different from 0).
summary(emm.03.fact, infer=c(T,T)) # p-values and confidence intervals## Time emmean SE df lower.CL upper.CL t.ratio p.value
## 0 587 76.1 4.35 382 792 7.708 0.0011
## 1 603 76.4 4.42 398 807 7.884 0.0009
## 2 615 75.4 4.19 410 821 8.163 0.0010
## 3 591 75.4 4.19 385 797 7.835 0.0012
## 4 602 75.3 4.16 396 808 7.996 0.0011
## 5 608 75.7 4.26 402 813 8.026 0.0010
## 6 616 75.8 4.28 411 822 8.128 0.0009
## 7 595 75.5 4.22 390 801 7.881 0.0011
## 8 601 75.6 4.23 395 806 7.945 0.0011
## 9 602 75.6 4.24 397 807 7.961 0.0011
##
## Degrees-of-freedom method: satterthwaite
## Confidence level used: 0.95
test(emm.03.fact) # only p-values = shortcut for infer=c(T,F)## Time emmean SE df t.ratio p.value
## 0 587 76.1 4.35 7.708 0.0011
## 1 603 76.4 4.42 7.884 0.0009
## 2 615 75.4 4.19 8.163 0.0010
## 3 591 75.4 4.19 7.835 0.0012
## 4 602 75.3 4.16 7.996 0.0011
## 5 608 75.7 4.26 8.026 0.0010
## 6 616 75.8 4.28 8.128 0.0009
## 7 595 75.5 4.22 7.881 0.0011
## 8 601 75.6 4.23 7.945 0.0011
## 9 602 75.6 4.24 7.961 0.0011
##
## Degrees-of-freedom method: satterthwaite
confint(emm.03.fact) # only confidence intervals = shortcut for infer=c(F,T)## Time emmean SE df lower.CL upper.CL
## 0 587 76.1 4.35 382 792
## 1 603 76.4 4.42 398 807
## 2 615 75.4 4.19 410 821
## 3 591 75.4 4.19 385 797
## 4 602 75.3 4.16 396 808
## 5 608 75.7 4.26 402 813
## 6 616 75.8 4.28 411 822
## 7 595 75.5 4.22 390 801
## 8 601 75.6 4.23 395 806
## 9 602 75.6 4.24 397 807
##
## Degrees-of-freedom method: satterthwaite
## Confidence level used: 0.95
Customize your test. For instance, to see if RT is significantly higher than 440 with Bonferroni correction.
side argument specifying whether the test is left-tailed (-1); right-tailed (1) or two-sided (2)
test(emm.03.fact, null = 440, side = 1, adjust = "bonferroni")## Time emmean SE df null t.ratio p.value
## 0 587 76.1 4.35 440 1.928 0.6022
## 1 603 76.4 4.42 440 2.127 0.4695
## 2 615 75.4 4.19 440 2.327 0.3879
## 3 591 75.4 4.19 440 2.000 0.5650
## 4 602 75.3 4.16 440 2.153 0.4746
## 5 608 75.7 4.26 440 2.214 0.4358
## 6 616 75.8 4.28 440 2.326 0.3813
## 7 595 75.5 4.22 440 2.056 0.5271
## 8 601 75.6 4.23 440 2.124 0.4858
## 9 602 75.6 4.24 440 2.143 0.4747
##
## Degrees-of-freedom method: satterthwaite
## P value adjustment: bonferroni method for 10 tests
## P values are right-tailed
Naming the method used to adjust pvalues or confidence limits.
(“holm”, “hochberg”, “hommel”, “bonferroni”, “BH”, “BY”, “fdr”, and “none”) The adjust argument specifies a multiplicity adjustment for tests or confidence intervals. This adjustment always is applied separately to each table or sub-table that you see in the printed output.
This object can now also be used to compare whether or not there are differences between the levels of the factor. By default Tuckey correction is used.
adjuste and side arguments are still available.
pairs(emm.03.fact)## contrast estimate SE df t.ratio p.value
## 0 - 1 -15.8931 23.9 87.0 -0.666 0.9996
## 0 - 2 -28.7454 19.9 87.0 -1.444 0.9096
## 0 - 3 -4.0545 20.0 87.0 -0.202 1.0000
## 0 - 4 -15.4002 20.1 87.1 -0.765 0.9989
## 0 - 5 -20.8920 21.4 87.0 -0.976 0.9929
## 0 - 6 -29.6736 21.5 87.0 -1.382 0.9294
## 0 - 7 -8.5329 20.3 87.0 -0.421 1.0000
## 0 - 8 -13.8232 20.7 87.0 -0.669 0.9996
## 0 - 9 -15.3200 21.3 87.0 -0.721 0.9993
## 1 - 2 -12.8524 21.2 87.0 -0.607 0.9998
## 1 - 3 11.8386 21.4 87.0 0.552 0.9999
## 1 - 4 0.4928 20.8 87.0 0.024 1.0000
## 1 - 5 -4.9989 22.2 87.0 -0.226 1.0000
## 1 - 6 -13.7806 23.2 87.1 -0.594 0.9999
## 1 - 7 7.3601 22.3 87.1 0.330 1.0000
## 1 - 8 2.0699 22.1 87.1 0.094 1.0000
## 1 - 9 0.5731 21.7 87.0 0.026 1.0000
## 2 - 3 24.6909 17.1 87.0 1.446 0.9089
## 2 - 4 13.3452 16.9 87.0 0.789 0.9986
## 2 - 5 7.8534 18.5 87.0 0.425 1.0000
## 2 - 6 -0.9282 19.1 87.1 -0.049 1.0000
## 2 - 7 20.2125 17.8 87.0 1.135 0.9796
## 2 - 8 14.9222 18.1 87.0 0.827 0.9980
## 2 - 9 13.4254 18.3 87.0 0.733 0.9992
## 3 - 4 -11.3457 16.9 87.0 -0.669 0.9996
## 3 - 5 -16.8375 18.6 87.0 -0.906 0.9959
## 3 - 6 -25.6191 19.3 87.1 -1.325 0.9451
## 3 - 7 -4.4784 17.7 87.0 -0.252 1.0000
## 3 - 8 -9.7687 17.6 87.0 -0.554 0.9999
## 3 - 9 -11.2655 18.1 87.0 -0.621 0.9998
## 4 - 5 -5.4918 17.8 87.0 -0.308 1.0000
## 4 - 6 -14.2734 18.4 87.0 -0.775 0.9988
## 4 - 7 6.8673 17.5 87.1 0.393 1.0000
## 4 - 8 1.5770 18.0 87.1 0.088 1.0000
## 4 - 9 0.0802 17.6 87.0 0.005 1.0000
## 5 - 6 -8.7816 20.2 87.0 -0.435 1.0000
## 5 - 7 12.3591 19.2 87.0 0.643 0.9997
## 5 - 8 7.0688 19.5 87.1 0.362 1.0000
## 5 - 9 5.5720 19.3 87.0 0.288 1.0000
## 6 - 7 21.1407 18.9 87.0 1.117 0.9816
## 6 - 8 15.8505 20.4 87.1 0.779 0.9987
## 6 - 9 14.3536 20.4 87.1 0.703 0.9994
## 7 - 8 -5.2903 18.5 87.1 -0.286 1.0000
## 7 - 9 -6.7871 19.1 87.1 -0.355 1.0000
## 8 - 9 -1.4968 18.6 87.0 -0.081 1.0000
##
## Degrees-of-freedom method: satterthwaite
## P value adjustment: tukey method for comparing a family of 10 estimates
# We can also get estimated means and pairwise contrasts simultaneously:
emmeans(model.03.fact, pairwise ~ Time, lmer.df = "satterthwaite")## $emmeans
## Time emmean SE df lower.CL upper.CL
## 0 587 76.1 4.35 382 792
## 1 603 76.4 4.42 398 807
## 2 615 75.4 4.19 410 821
## 3 591 75.4 4.19 385 797
## 4 602 75.3 4.16 396 808
## 5 608 75.7 4.26 402 813
## 6 616 75.8 4.28 411 822
## 7 595 75.5 4.22 390 801
## 8 601 75.6 4.23 395 806
## 9 602 75.6 4.24 397 807
##
## Degrees-of-freedom method: satterthwaite
## Confidence level used: 0.95
##
## $contrasts
## contrast estimate SE df t.ratio p.value
## 0 - 1 -15.8931 23.9 87.0 -0.666 0.9996
## 0 - 2 -28.7454 19.9 87.0 -1.444 0.9096
## 0 - 3 -4.0545 20.0 87.0 -0.202 1.0000
## 0 - 4 -15.4002 20.1 87.1 -0.765 0.9989
## 0 - 5 -20.8920 21.4 87.0 -0.976 0.9929
## 0 - 6 -29.6736 21.5 87.0 -1.382 0.9294
## 0 - 7 -8.5329 20.3 87.0 -0.421 1.0000
## 0 - 8 -13.8232 20.7 87.0 -0.669 0.9996
## 0 - 9 -15.3200 21.3 87.0 -0.721 0.9993
## 1 - 2 -12.8524 21.2 87.0 -0.607 0.9998
## 1 - 3 11.8386 21.4 87.0 0.552 0.9999
## 1 - 4 0.4928 20.8 87.0 0.024 1.0000
## 1 - 5 -4.9989 22.2 87.0 -0.226 1.0000
## 1 - 6 -13.7806 23.2 87.1 -0.594 0.9999
## 1 - 7 7.3601 22.3 87.1 0.330 1.0000
## 1 - 8 2.0699 22.1 87.1 0.094 1.0000
## 1 - 9 0.5731 21.7 87.0 0.026 1.0000
## 2 - 3 24.6909 17.1 87.0 1.446 0.9089
## 2 - 4 13.3452 16.9 87.0 0.789 0.9986
## 2 - 5 7.8534 18.5 87.0 0.425 1.0000
## 2 - 6 -0.9282 19.1 87.1 -0.049 1.0000
## 2 - 7 20.2125 17.8 87.0 1.135 0.9796
## 2 - 8 14.9222 18.1 87.0 0.827 0.9980
## 2 - 9 13.4254 18.3 87.0 0.733 0.9992
## 3 - 4 -11.3457 16.9 87.0 -0.669 0.9996
## 3 - 5 -16.8375 18.6 87.0 -0.906 0.9959
## 3 - 6 -25.6191 19.3 87.1 -1.325 0.9451
## 3 - 7 -4.4784 17.7 87.0 -0.252 1.0000
## 3 - 8 -9.7687 17.6 87.0 -0.554 0.9999
## 3 - 9 -11.2655 18.1 87.0 -0.621 0.9998
## 4 - 5 -5.4918 17.8 87.0 -0.308 1.0000
## 4 - 6 -14.2734 18.4 87.0 -0.775 0.9988
## 4 - 7 6.8673 17.5 87.1 0.393 1.0000
## 4 - 8 1.5770 18.0 87.1 0.088 1.0000
## 4 - 9 0.0802 17.6 87.0 0.005 1.0000
## 5 - 6 -8.7816 20.2 87.0 -0.435 1.0000
## 5 - 7 12.3591 19.2 87.0 0.643 0.9997
## 5 - 8 7.0688 19.5 87.1 0.362 1.0000
## 5 - 9 5.5720 19.3 87.0 0.288 1.0000
## 6 - 7 21.1407 18.9 87.0 1.117 0.9816
## 6 - 8 15.8505 20.4 87.1 0.779 0.9987
## 6 - 9 14.3536 20.4 87.1 0.703 0.9994
## 7 - 8 -5.2903 18.5 87.1 -0.286 1.0000
## 7 - 9 -6.7871 19.1 87.1 -0.355 1.0000
## 8 - 9 -1.4968 18.6 87.0 -0.081 1.0000
##
## Degrees-of-freedom method: satterthwaite
## P value adjustment: tukey method for comparing a family of 10 estimates
Or a slightly more powerful method. Dealing with General Linear Hypothesis, we use the model error (from the variance-covariance matrix of the main parameters of a fitted model), not individual parts of the dataset (as.glht function). Also, method free from package multcomp, which takes the correlation of the model parameters into account.
summary(as.glht(pairs(emm.03.fact)), test=adjusted("free"))##
## Simultaneous Tests for General Linear Hypotheses
##
## Linear Hypotheses:
## Estimate Std. Error t value Pr(>|t|)
## 0 - 1 == 0 -15.89306 23.85544 -0.666 0.999
## 0 - 2 == 0 -28.74543 19.90942 -1.444 0.907
## 0 - 3 == 0 -4.05451 20.02507 -0.202 1.000
## 0 - 4 == 0 -15.40023 20.13630 -0.765 0.998
## 0 - 5 == 0 -20.89201 21.41521 -0.976 0.992
## 0 - 6 == 0 -29.67364 21.46400 -1.382 0.925
## 0 - 7 == 0 -8.53292 20.26086 -0.421 1.000
## 0 - 8 == 0 -13.82319 20.65798 -0.669 0.999
## 0 - 9 == 0 -15.32001 21.26038 -0.721 0.999
## 1 - 2 == 0 -12.85237 21.18035 -0.607 1.000
## 1 - 3 == 0 11.83855 21.43971 0.552 1.000
## 1 - 4 == 0 0.49283 20.84324 0.024 1.000
## 1 - 5 == 0 -4.99894 22.16322 -0.226 1.000
## 1 - 6 == 0 -13.78058 23.20029 -0.594 1.000
## 1 - 7 == 0 7.36014 22.33518 0.330 1.000
## 1 - 8 == 0 2.06987 22.06825 0.094 1.000
## 1 - 9 == 0 0.57305 21.74574 0.026 1.000
## 2 - 3 == 0 24.69092 17.07641 1.446 0.907
## 2 - 4 == 0 13.34520 16.92202 0.789 0.998
## 2 - 5 == 0 7.85343 18.47765 0.425 1.000
## 2 - 6 == 0 -0.92821 19.12083 -0.049 1.000
## 2 - 7 == 0 20.21251 17.80748 1.135 0.977
## 2 - 8 == 0 14.92224 18.05012 0.827 0.997
## 2 - 9 == 0 13.42542 18.30952 0.733 0.999
## 3 - 4 == 0 -11.34572 16.94929 -0.669 0.999
## 3 - 5 == 0 -16.83749 18.58936 -0.906 0.995
## 3 - 6 == 0 -25.61913 19.33490 -1.325 0.941
## 3 - 7 == 0 -4.47841 17.73796 -0.252 1.000
## 3 - 8 == 0 -9.76868 17.63333 -0.554 1.000
## 3 - 9 == 0 -11.26550 18.14281 -0.621 0.999
## 4 - 5 == 0 -5.49177 17.80900 -0.308 1.000
## 4 - 6 == 0 -14.27341 18.40723 -0.775 0.998
## 4 - 7 == 0 6.86731 17.49000 0.393 1.000
## 4 - 8 == 0 1.57704 17.99649 0.088 1.000
## 4 - 9 == 0 0.08022 17.57269 0.005 1.000
## 5 - 6 == 0 -8.78164 20.19433 -0.435 1.000
## 5 - 7 == 0 12.35908 19.22190 0.643 0.999
## 5 - 8 == 0 7.06882 19.53183 0.362 1.000
## 5 - 9 == 0 5.57200 19.32921 0.288 1.000
## 6 - 7 == 0 21.14072 18.92116 1.117 0.979
## 6 - 8 == 0 15.85045 20.35456 0.779 0.998
## 6 - 9 == 0 14.35363 20.41572 0.703 0.999
## 7 - 8 == 0 -5.29027 18.49574 -0.286 1.000
## 7 - 9 == 0 -6.78709 19.10010 -0.355 1.000
## 8 - 9 == 0 -1.49682 18.55648 -0.081 1.000
## (Adjusted p values reported -- free method)
Interaction
We don’t report that before, RT deal with Time factor AND Training factor (yes/no). Let’s imagine a significant interaction.
## `geom_smooth()` using formula 'y ~ x'
As usual, a quick, easy and powerful formula syntax. It should be of the form ~ specs | by
specs : names of the preedictor over wich marginals are desired.
by : names of predictors to condition on.
Formula for fixed effects
model.04.fact <- lmer(RT ~ as.factor(as.character(Time))*Training + (1|Subject), data = Raw.Datas)
emm.04.fact <- emmeans(model.04.fact, specs = "Time", by = "Training")
emm.04.fact## Training = no:
## Time emmean SE df lower.CL upper.CL
## 0 560 108 3.30 233 887
## 1 581 107 3.17 251 911
## 2 579 106 3.11 247 911
## 3 554 107 3.17 223 884
## 4 568 106 3.06 234 902
## 5 567 106 3.11 235 899
## 6 584 107 3.12 253 916
## 7 542 107 3.18 212 871
## 8 538 109 3.45 215 861
## 9 545 107 3.13 214 877
##
## Training = yes:
## Time emmean SE df lower.CL upper.CL
## 0 636 131 3.14 230 1041
## 1 579 135 3.62 187 971
## 2 671 130 3.11 264 1077
## 3 647 130 3.07 239 1056
## 4 633 132 3.31 233 1033
## 5 671 132 3.28 270 1072
## 6 651 132 3.31 251 1051
## 7 665 130 3.08 258 1073
## 8 664 130 3.05 255 1073
## 9 688 131 3.15 283 1093
##
## Degrees-of-freedom method: kenward-roger
## Confidence level used: 0.95
Custom Contrasts
The user may write a custom contrast function for use in contrast.
In our exemple, pairwise method give too much tests for all 10 Time levels.
Just baseline comparison is needed.
des_cont <- list("0_1" = c(1, -1, 0, 0, 0, 0, 0, 0, 0, 0),
"0_2" = c(1, 0, -1, 0, 0, 0, 0, 0, 0, 0),
"0_3" = c(1, 0, 0, -1, 0, 0, 0, 0, 0, 0),
"0_4" = c(1, 0, 0, 0, -1, 0, 0, 0, 0, 0),
"0_5" = c(1, 0, 0, 0, 0, -1, 0, 0, 0, 0),
"0_6" = c(1, 0, 0, 0, 0, 0, -1, 0, 0, 0),
"0_7" = c(1, 0, 0, 0, 0, 0, 0, -1, 0, 0),
"0_8" = c(1, 0, 0, 0, 0, 0, 0, 0, -1, 0),
"0_9" = c(1, 0, 0, 0, 0, 0, 0, 0, 0, -1))
test(contrast(emm.04.fact, des_cont, adjust = "FDR"), side = ">")## Training = no:
## contrast estimate SE df t.ratio p.value
## 0_1 -21.06 30.2 78.0 -0.698 0.8000
## 0_2 -18.49 28.5 78.0 -0.648 0.8000
## 0_3 6.47 30.6 78.0 0.211 0.8000
## 0_4 -7.82 27.0 78.0 -0.290 0.8000
## 0_5 -7.28 28.6 78.0 -0.255 0.8000
## 0_6 -24.18 28.6 78.0 -0.846 0.8000
## 0_7 18.55 30.4 78.0 0.610 0.8000
## 0_8 22.19 37.6 78.0 0.589 0.8000
## 0_9 14.74 29.4 78.0 0.502 0.8000
##
## Training = yes:
## contrast estimate SE df t.ratio p.value
## 0_1 57.11 46.8 78.0 1.219 0.9522
## 0_2 -34.90 27.3 78.0 -1.280 0.9522
## 0_3 -11.47 25.6 78.0 -0.448 0.9522
## 0_4 3.08 37.0 78.0 0.083 0.9522
## 0_5 -35.20 35.4 78.0 -0.995 0.9522
## 0_6 -15.13 35.4 78.0 -0.428 0.9522
## 0_7 -29.52 26.3 78.0 -1.124 0.9522
## 0_8 -28.33 25.2 78.0 -1.123 0.9522
## 0_9 -52.17 30.9 78.1 -1.687 0.9522
##
## Degrees-of-freedom method: kenward-roger
## P value adjustment: fdr method for 9 tests
## P values are right-tailed
Here we use only the contrast (des_cont) with FDR correction and only for > side.
Display
afex_plot() visualizes results from factorial experiments combining estimated marginal means and uncertainties associated with the estimated means in the foreground.
plot(emm.04.fact,
comparison = TRUE,
CIs = TRUE,
horizontal = FALSE)afex_plot(model.04.fact,
"Time", "Training",
error = "model",
error_ci = FALSE,
data_geom = ggplot2::geom_boxplot,
mapping = c("linetype", "shape", "fill")) +
facet_wrap(~Training)Error bars provide a grahical representation of the variability of the estimated means. In the case of a mixed-design, no single type of error bar is possible that allows comparison across all crossed random conditions. There exist several possibilities which particular measure of variability to use: - confidence intervals (model-based) - standard errors (model-based)
In the case of designs involving repeated-measures factors these mesures cannot be used as significant differences as this requires knowledge about the correlation between measures. error = "within"" allow judging differences between repeated-measures condition.
model.05.fact <- lmer(RT ~ as.factor(as.character(Time))*Training + (as.factor(as.character(Time))|Subject), data = Raw.Datas)
afex_plot(model.05.fact,
"Time", "Training",
error = "within",
error_ci = FALSE,
data_geom = ggplot2::geom_boxplot,
mapping = c("linetype", "shape", "fill")) +
facet_wrap(~Training)Continuous numerical variable
The emtrends function is useful when a fitted model involves a continuous numerical predictor and an interacting with another predictor.
Linear model
model.04.num <- glmer(RT ~ Training*Time + (Time|Subject), data = Raw.Datas)
emm.01 <- emtrends(model.04.num,
specs = "Training", var = "Time",
options = list())
emm.01## Training Time.trend SE df lower.CL upper.CL
## no -3.59 2.75 3.29 -11.91 4.72
## yes 4.51 2.99 2.38 -6.57 15.60
##
## Degrees-of-freedom method: kenward-roger
## Confidence level used: 0.95
Emergence test: slope significance.
summary(emm.01, infer=c(T,T))## Training Time.trend SE df lower.CL upper.CL t.ratio p.value
## no -3.59 2.75 3.29 -11.91 4.72 -1.308 0.2746
## yes 4.51 2.99 2.38 -6.57 15.60 1.510 0.2506
##
## Degrees-of-freedom method: kenward-roger
## Confidence level used: 0.95
To display these slopes. We have to plot the predicted values.
plot(ggpredict(model.04.num, terms = c("Time", "Training")))Or we could use the emmip() function.
emmip(model.04.num,
Training ~ Time,
cov.reduce = range) Note the cov.reduce = range argument is passed because by default, each covariate is reduced to only one value (the mean). Instead, we call the range() function to obtain the minimum and maximum diameter.
Test difference between slopes:
emtrends(model.04.num, pairwise ~ Training, var="Time")## $emtrends
## Training Time.trend SE df lower.CL upper.CL
## no -3.59 2.75 3.29 -11.91 4.72
## yes 4.51 2.99 2.38 -6.57 15.60
##
## Degrees-of-freedom method: kenward-roger
## Confidence level used: 0.95
##
## $contrasts
## contrast estimate SE df t.ratio p.value
## no - yes -8.11 4.06 2.73 -1.997 0.1486
##
## Degrees-of-freedom method: kenward-roger
Test difference between factor levels at different points of the continuous variable:
emm.02 <- emmeans(model.04.num, ~ Time | Training,
at = list(Time = c(0, 5, 9)))
pairs(emm.02)## Training = no:
## contrast estimate SE df t.ratio p.value
## 0 - 5 18.0 13.7 3.29 1.308 0.4753
## 0 - 9 32.4 24.7 3.29 1.308 0.4753
## 5 - 9 14.4 11.0 3.29 1.308 0.4753
##
## Training = yes:
## contrast estimate SE df t.ratio p.value
## 0 - 5 -22.6 14.9 2.38 -1.510 0.4251
## 0 - 9 -40.6 26.9 2.38 -1.510 0.4251
## 5 - 9 -18.1 12.0 2.38 -1.510 0.4251
##
## Degrees-of-freedom method: kenward-roger
## P value adjustment: tukey method for comparing a family of 3 estimates
All arguments are still available.
des_cont <- c(list("0-5" = c(1, -1, 0),
"0-9" = c(1, 0, -1)))
test(contrast(regrid(emm.02), des_cont, adjust = "FDR"), side = ">")## Training = no:
## contrast estimate SE df t.ratio p.value
## 0-5 18.0 13.7 3.00 1.308 0.1410
## 0-9 32.4 24.7 3.00 1.308 0.1410
##
## Training = yes:
## contrast estimate SE df t.ratio p.value
## 0-5 -22.6 14.9 3.00 -1.510 0.8859
## 0-9 -40.6 26.9 2.98 -1.510 0.8859
##
## Degrees-of-freedom method: inherited from kenward-roger when re-gridding
## P value adjustment: fdr method for 2 tests
## P values are right-tailed
Non-Linear model
library(splines)
model.05.num <- glmer(RT ~ Training*ns(Time,2) + (ns(Time,2)|Subject), data = Raw.Datas)There are many functions to generate a non-linear behaviour. Here we use a ns() function where spline was generate. Knots argument is the number of breakpoints taht define the spline. Here is close to plynomial degree 2.
Then we can’t test the slope differences. We have to test difference between factor levels at different points of the continuous variable.
Ordinal variable
Part 6 - FAQ Reviewers
Effect size
Why you used a GLM ?
For instance: “In most previous studies of the attributional model, ANOVAs were used to analyze the effects of experimentally manipulated factors (…). Therefore, the use of these methods needs to be better justified and the methods need to be explained in more detail. In addition, the traditional analyses should also be reported to judge whether results comparable to those found in previous studies are obtained with these methods.”
Power
Only the simulation could provide a power estimation.
Bates, Kliegl, Vasishth, and Baayen (2015) said that GLM is a best way to deal with heterogeneity :“as it results in the highest power without an increase in Type I errors”.
R2
Here, a paper to talk about R2* and twelve properties of “traditional”” R2 for regression models: https://besjournals.onlinelibrary.wiley.com/doi/full/10.1111/j.2041-210x.2012.00261.x
With r.squaredGLMM()** (package MuMIn), we calculate conditional and marginal coefficient of determination for GLM.
Marginal represents the variance explained by the fixed effects.
Conditional is interpreted as a variance explained by the entire model, including both fixed and random effects.
Bayesian
I’m always looking for a way to talk about the power of analysis with GLM. I wonder if the wonderful world of Bayesian can help us. Here is the theory (if there are experts to correct me, I am waiting for your feedback) :
In frequentist world, power analysis can help disentangle these alternatives :
evidence of absence
evidence of presence
absence of evidence
In our analysis (GLM), power is a tricky question. But Bayesian approach could be helpful (https://www.nature.com/articles/s41593-020-0660-4).
The Bayesian Factor (BF) should primarily be seen as a continuous measure of evidence.
BF quantifies the degree to which the data warrant a change in beliefs, and it therefore represents the strength of evidence that the data provide for H0 vs H1.
The advantage of retaining a continuous representation of evidence was stressed by Rozeboom
(http://stats.org.uk/statistical-inference/Rozeboom1960.pdf):
“The null-hypothesis significance test treats ‘acceptance’ or ‘rejection’ of a hypothesis as though these were decisions one makes. But a hypothesis is not something, like a piece of pie offered for dessert, which can be accepted or rejected by a voluntary physical action. Acceptance or rejection of a hypothesis is a cognitive process, a degree of believing or disbelieving which, if rational, is not a matter of choice but determined solely by how likely it is, given the evidence, that the hypothesis is true.”
Userers can judge the strength of the evidence directly from the numerical value of BF, with a BF twice as high providing evidence twice as strong. In contrast, it can be difficult to interpret an actual P value as strength of evidence, as P = 0.01 does not provide five times as much evidence as P = 0.05.
Jeffreys proposed reference values to guide the interpretation of the strength of the evidence9. These values were spaced out in exponential half steps of 10, 100.5 ≈ 3, 101 = 10, 101.5 ≈ 30, etc., to be equidistant on a log scale.
Bayesian approach provide several metrics.
The Probability of Direction (pd) is an index of effect existence representing the certainty with which an effect goes in a particular direction (i.e., is positive or negative). Beyond its simplicity of interpretation, understanding and computation, this index also presents other interesting properties. It is independent from the model, i.e., it is solely based on the posterior distributions and does not require any additional information from the data or the model. The pd is not relevant for assessing the size or importance of an effect and is not able to provide information in favor of the null hypothesis. In other words, a high pd suggests the presence of an effect but a small pd does not give us any information about how plausible the null hypothesis is, suggesting that this index can only be used to eventually reject the null hypothesis (which is consistent with the interpretation of the frequentist p-value). In contrast, BFs increase or decrease as the evidence becomes stronger.
Conclusion
Where classical ANOVA uses a least squares solution, LMMs uses a maximum likelihood solution(details here). As a consequence, LMMs do not need balanced and complete data. A tremendous advantage of this is that we no longer need to pool and average subject data as we do in ANOVAs: with LMMs we can model all trials!
LMMs are also very robust to violations of homoskedasticity and sphericity (e.g. Arnau et al. 2013). With balanced design, no missing datas and sphericity, ANOVA and LMMs yield the same results.
The default parameter estimation criterion for linear mixed models is restricted (or “residual”) maximum likelihood (REML). The likelihood of the parameters, given the observed data. You can read more about LMMs fitting procedure here.
LMM have many advantages over ANOVA, including (but not limited to):
- Analysis of single trial data (as opposed to aggregated means per condition).
- Specifying more than one random factor.
- The use of continuous variables as predictors.
- Making you look like you know what you’re doing.
- Defeating the reviewer.
- The ability to specify custom models
Does using lmer() make one a bayesian ? No. But with model parameters that are random, the use of posterior distributions, shrinkage of estimates towards a more stable target, life suddenly feels more bayesian.